Lecture 9 : Linear Bandits ( Part II )

نویسنده

  • Yuanjun Gao
چکیده

There exist an elliptical confidence region for the w, as described in the following theorem Theorem 1. ([2], Theorem 2) Assuming ‖w‖ ≤ √ d and ‖xt‖ ≤ √ d, with probably 1− δ, we have w ∈ Ct, where Ct = { z : ‖z − ŵt‖Mt ≤ 2 √ d log Td δ } For any x ∈ A, we define UCBx,t = maxz∈Ct z′x if w ∈ Ct (which holds with high probability). At each time, the UCB algorithm then simply picks the bandit with the highest UCB given all previous observation. xt = arg max x∈A UCBx,t−1 = arg max x∈A,z∈Ct−1 x′z

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )

In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on t...

متن کامل

Lecture 2 : Bandits with i . i . d rewards ( Part II )

So far we’ve discussed non-adaptive exploration strategies. Now let’s talk about adaptive exploration, in a sense that the bandit feedback of different arms in previous rounds are fully utilized. Let’s start with 2 arms. One fairly natural idea is to alternate them until we find that one arm is much better than the other, at which time we abandon the inferior one. But how to define ”one arm is ...

متن کامل

A Survey on Contextual Multi-armed Bandits

4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...

متن کامل

Asymptotic optimal control of multi-class restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Sinc...

متن کامل

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016